Read in our dataset and display summary statistics

Here, we attach data and print the first few rows to understand how it looks and what we have access to from the World Management Survey

data <- read.csv("management_and_health_034.csv")
attach(data)
head(data)
ABCDEFGHIJ0123456789
 
 
hospital_id
<int>
management
<dbl>
zmanagement
<dbl>
lmba
<dbl>
mba
<dbl>
zami_rate
<dbl>
hos_bed
<dbl>
hos_lbed
<dbl>
hos_fprofit
<int>
112.318182-0.13677580.47000360.60.3727507772.596.6497480
222.454545-0.15938510.40546510.51.1042490367.725.9073220
332.363636-0.12730130.64185390.9-0.2483238760.596.6340950
442.363636-0.23508370.33647220.41.2353290808.436.6950940
551.818182-0.90322970.18232160.20.22353121105.007.0076010
662.409091-0.06165860.00000000.00.0276812724.256.5851360

Printing some in-depth summary statistics:

describe(data,fast=TRUE)
ABCDEFGHIJ0123456789
 
 
vars
<dbl>
n
<dbl>
mean
<dbl>
sd
<dbl>
median
<dbl>
min
<dbl>
max
<dbl>
hospital_id14531.001364e+03766.692218391476.00000001.0000001930.0000000
management24532.765626e+000.594303072.80000001.0500004.3000000
zmanagement34535.094226e-010.930597950.5303775-2.1632122.9001230
lmba44532.459330e-010.201766790.22314350.0000000.6931472
mba54533.060927e-010.280101850.25000000.0000001.0000000
zami_rate64531.895272e-021.01495585-0.1038135-2.1921244.8339860
hos_bed74533.463373e+02332.85185550245.000000018.0000003000.0000000
hos_lbed84535.243901e+004.986543235.4889380-99.0000008.0063680
hos_fprofit94535.960265e-020.237010920.00000000.0000001.0000000
hos_nfprofit104533.200883e-010.467025980.00000000.0000001.0000000

Exploratory Data Analysis

Creating Appendix B: Figure 1

Below, this plot illustrates how the quality of management changes based on the country the hospital was surveyed in.

ggplot(data=data, aes(x=country, y=zmanagement, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="country",
         y="zmanagement",
         group="Country",
         title="Management Quality by Country")

Creating Appendix C: Table 1

As an extension of the analysis from the plot above, we will be using simple linear regression to see how related the country the hospital is in to the quality of management. Note all coefficients are very statistically significant, and Brazil is the excluded category represented by the intercept.

summary(lm(zmanagement ~ country))
## 
## Call:
## lm(formula = zmanagement ~ country)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.46432 -0.50219 -0.04971  0.50020  2.00772 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -0.18642    0.07434  -2.508 0.012502 *  
## countryCanada          0.63603    0.17499   3.635 0.000311 ***
## countrySweden          0.57471    0.14919   3.852 0.000134 ***
## countryUnited Kingdom  0.52922    0.10209   5.184  3.3e-07 ***
## countryUnited States   1.33023    0.09626  13.818  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7761 on 448 degrees of freedom
## Multiple R-squared:  0.3107, Adjusted R-squared:  0.3045 
## F-statistic: 50.47 on 4 and 448 DF,  p-value: < 2.2e-16

In contrast to the analysis above, this plot illustrates how the mortality risk changes based on the country the hospital was surveyed in.

ggplot(data=data, aes(x=country, y=zami_rate, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="country",
         y="zami_rate",
         group="Country",
         title="Heart Attack Mortality Rate by Country")

As an extension of the analysis from the plot above, we will be using simple linear regression to see how related the country the hospital is in to the mortality rate of heart attacks. Note none of the coefficients are statistically significant, and Brazil is the excluded category represented by the intercept.

summary(lm(zami_rate ~ country))
## 
## Call:
## lm(formula = zami_rate ~ country)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2735 -0.6695 -0.1333  0.5402  4.8548 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)            2.844e-08  9.756e-02   0.000    1.000
## countryCanada         -1.177e-08  2.297e-01   0.000    1.000
## countrySweden          5.355e-02  1.958e-01   0.273    0.785
## countryUnited Kingdom  8.140e-02  1.340e-01   0.607    0.544
## countryUnited States  -2.083e-02  1.263e-01  -0.165    0.869
## 
## Residual standard error: 1.019 on 448 degrees of freedom
## Multiple R-squared:  0.001773,   Adjusted R-squared:  -0.00714 
## F-statistic: 0.1989 on 4 and 448 DF,  p-value: 0.9389

Creating Appendix B: Figure 2

Below, this plot illustrates how management quality differs across countries

ggplot(data=data, aes(x=country, y=zmanagement, group=country, color=as.factor(yy))) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="yy",
         y="zmanagement",
         color="Year of Survey",
         title="Year of Survey on Management Quality, by Country")

It is very important to note that in the figure above, only the United Kingdom’s hospitals span two different sample groups (red being 2006 and green being 2009). Therefore, we should investigate this further

Creating Appendix B: Figure 3

Below, this plot illustrates how heart attack mortality rates vary across countries

ggplot(data=data, aes(x=country, y=zami_rate, group=country, color=as.factor(yy))) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="yy",
         y="zami_rate",
         color="Year of Survey",
         title="Year of Survey on Mortality Rate, by Country")

Creating Appendix B: Figure 4

Note that there does not appear to be a statistically significant relationship or different between the two survey years and the quality of management present. Because of this lack of significance, we can feel more confident by not controlling for time as it doesn’t seem necessary. More support of this decision will be presented later in this file.

ggplot() +
    geom_point(data=data %>% filter(country == "United Kingdom"),
               aes(x=yy, y=zmanagement, color=as.factor(yy))) +
    geom_smooth(method="lm",
                data=data %>% filter(country == "United Kingdom"),
                fill="grey", color="black",
                aes(x=yy, y=zmanagement, color="Line of Best Fit")) +
    theme_minimal() +
    labs(x="yy",
         y="zmanagement",
         color="Year of Survey",
         group="Line of Best Fit",
         title="Analysis of Year on Management Quality for UK Hospitals")
## `geom_smooth()` using formula = 'y ~ x'

Creating Appendix B: Figure 5

For robustness sake, we will also be completing the same analysis as the figure above but instead searching for a potential realtionship between year and mortality rate. Because of this lack of significance, we can feel more confident by not controlling for time as it doesn’t seem necessary. More support of this decision will be presented later in this file.

ggplot() +
    geom_point(data=data %>% filter(country == "United Kingdom"),
               aes(x=yy, y=zami_rate, color=as.factor(yy))) +
    geom_smooth(method="lm",
                data=data %>% filter(country == "United Kingdom"),
                fill="grey", color="black",
                aes(x=yy, y=zami_rate, color="Line of Best Fit")) +
    theme_minimal() +
    labs(x="yy",
         y="zami_rate",
         color="Year of Survey",
         group="Line of Best Fit",
         title="Analysis of Year on Mortality Rate for UK Hospitals")
## `geom_smooth()` using formula = 'y ~ x'

Creating Appendix B: Figure 6

We wanted to check if the commuting time to the nearest joint Medical Business school would be a good instrumental variable. Our hypothesis was that a hospital that is closer to a hospital would have a strong supply of good managerial knowledge and talent, so the farther away a hospital is the lower zmanagement would be on average. The plot below shows that trend, leading some evidence to using logcom_ttime as an instrument.

# zmanagement vs logcom_ttime
ggplot() +
  geom_point(data=data, aes(x=logcom_ttime, y=zmanagement), color="blue") +
  geom_smooth(method="lm",
              data=data,
              fill="grey", color="black",
              aes(x=logcom_ttime, y=zmanagement, color="Line of Best Fit")) +
  theme_minimal() +
  scale_fill_discrete("blue") +
  labs(x="ln(commuting time to nearest joint M-B)",
       y="zmanagement",
       group="Line of Best Fit",
       title="Analysis of Commuting Time to Nearest M-B on Management Quality")
## `geom_smooth()` using formula = 'y ~ x'

Creating Appendix B: Figure 7

We wanted to check if the age of the nearest joint Medical Business school would be a good instrumental variable. Our hypothesis was that a hospital that is older and more established will have a track record of sending graduates to this hospital, accumulating management skill over time, leading to higher management scores. The plot below shows that trend, leading some evidence to using com_lage as an instrument.

#zmanagement vs com_lage
ggplot() +
  # we exlude the outliers which are very negative (-99s)
  geom_point(data=data %>% filter(com_lage >= 0), aes(x=com_lage, y=zmanagement), color="blue") +
  geom_smooth(method="lm",
              data=data %>% filter(com_lage >= 0),
              fill="grey", color="black",
              aes(x=com_lage, y=zmanagement, color="Line of Best Fit")) +
  theme_minimal() +
  scale_fill_discrete("blue") +
  labs(x="ln(age of nearest M-B)",
       y="zmanagement",
       #color="Year of Survey",
       group="Line of Best Fit",
       title="Analysis of Age of Nearest M-B on zmanagement")
## `geom_smooth()` using formula = 'y ~ x'

Another variable that we wanted to investigate was the survey_reliability because of its believed importance to our initial hypothesis before going in-depth on the modeling process.

# EXPLORING SINGLE VARIABLES
# CONTINUOUS
# histogram
ggplot() +
  geom_histogram(aes(x=survey_reliability[survey_reliability>=0]))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# scatter
ggplot() +
  geom_point(aes(x=survey_reliability[survey_reliability>=0], y=zami_rate[survey_reliability>=0]))

# linear model
summary(lm(zami_rate[survey_reliability>=0] ~ survey_reliability[survey_reliability>=0]))
## 
## Call:
## lm(formula = zami_rate[survey_reliability >= 0] ~ survey_reliability[survey_reliability >= 
##     0])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2557 -0.6797 -0.1012  0.5301  4.8538 
## 
## Coefficients:
##                                             Estimate Std. Error t value
## (Intercept)                                  0.18871    0.20089   0.939
## survey_reliability[survey_reliability >= 0] -0.02085    0.02417  -0.863
##                                             Pr(>|t|)
## (Intercept)                                    0.348
## survey_reliability[survey_reliability >= 0]    0.389
## 
## Residual standard error: 1.019 on 445 degrees of freedom
## Multiple R-squared:  0.00167,    Adjusted R-squared:  -0.0005739 
## F-statistic: 0.7442 on 1 and 445 DF,  p-value: 0.3888
# CATEGORICAL
table(survey_reliability_miss)
## survey_reliability_miss
##   0   1 
## 447   6
data %>%
  group_by(survey_reliability_miss) %>%
  summarise(mean_zami_rate = mean(zami_rate))
ABCDEFGHIJ0123456789
survey_reliability_miss
<int>
mean_zami_rate
<dbl>
00.02047175
1-0.09421443

Preliminary Modeling

Overall, we know that the variable of interest Y will be zami_rate and that D will be zmanagement for the reasons mentioned in the report. Additionally, we need to consider the different controls that we have access to.

Validation for hos_bed

ggplot(data=data, aes(x=hos_bed, y=zmanagement, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="hos_bed",
         y="zmanagement",
         group="Country",
         title="`hos_bed` versus `zmanagement`, by `country`")

ggplot(data=data, aes(x=hos_bed, y=zami_rate, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="hos_bed",
         y="zami_rate",
         group="Country",
         title="`hos_bed` versus `zami_rate`, by `country`")

Validation for hos_fprofit (note that jitter is turned on to show the concentration of points around discrete values)

ggplot(data=data, aes(x=hos_fprofit, y=zmanagement, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="hos_fprofit",
         y="zmanagement",
         group="Country",
         title="`hos_fprofit` versus `zmanagement`, by `country`")

ggplot(data=data, aes(x=hos_fprofit, y=zami_rate, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="hos_fprofit",
         y="zami_rate",
         group="Country",
         title="`hos_fprofit` versus `zami_rate`, by `country`")

Validation for hos_nfprofit (note that jitter is turned on to show the concentration of points around discrete values)

ggplot(data=data, aes(x=hos_nfprofit, y=zmanagement, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="hos_nfprofit",
         y="zmanagement",
         group="Country",
         title="`hos_nfprofit` versus `zmanagement`, by `country`")

ggplot(data=data, aes(x=hos_nfprofit, y=zami_rate, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="hos_nfprofit",
         y="zami_rate",
         group="Country",
         title="`hos_nfprofit` versus `zami_rate`, by `country`")

Validation for grid_temp_new (note that jitter is turned on to show the concentration of points around discrete values)

ggplot(data=data, aes(x=grid_temp_new, y=zmanagement, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="grid_temp_new",
         y="zmanagement",
         group="Country",
         title="`grid_temp_new` versus `zmanagement`, by `country`")

ggplot(data=data, aes(x=grid_temp_new, y=zami_rate, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="grid_temp_new",
         y="zami_rate",
         group="Country",
         title="`grid_temp_new` versus `zami_rate`, by `country`")

ggplot(data=data, aes(x=grid_temp_new_miss, y=zmanagement, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="grid_temp_new_miss",
         y="zmanagement",
         group="Country",
         title="`grid_temp_new_miss` versus `zmanagement`, by `country`")

ggplot(data=data, aes(x=grid_temp_new_miss, y=zami_rate, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="grid_temp_new_miss",
         y="zami_rate",
         group="Country",
         title="`grid_temp_new_miss` versus `zami_rate`, by `country`")

Validation for survey_reliability (note that jitter is turned on to show the concentration of points around discrete values)

ggplot(data=data, aes(x=survey_reliability, y=zmanagement, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="survey_reliability",
         y="zmanagement",
         group="Country",
         title="`survey_reliability` versus `zmanagement`, by `country`")

ggplot(data=data, aes(x=survey_reliability, y=zami_rate, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="survey_reliability",
         y="zami_rate",
         group="Country",
         title="`survey_reliability` versus `zami_rate`, by `country`")

ggplot(data=data, aes(x=survey_reliability_miss, y=zmanagement, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="survey_reliability_miss",
         y="zmanagement",
         group="Country",
         title="`survey_reliability_miss` versus `zmanagement`, by `country`")

ggplot(data=data, aes(x=survey_reliability_miss, y=zami_rate, group=country, color=country)) +
    geom_point() +
    geom_jitter() +
    theme_minimal() +
    labs(x="survey_reliability_miss",
         y="zami_rate",
         group="Country",
         title="`survey_reliability_miss` versus `zami_rate`, by `country`")

IV First Stage and creating Appendix C: Table 3

Using the instruments we picked in the above EDA, we can create the first stage of the IV regression. We predict zmanagement using our instruments logcom_ttime, com_lage, and com_lage_miss, along with all of our controls.

stage_1_lm <- lm(zmanagement ~ logcom_ttime + com_lage + com_lage_miss + as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss)
summary(stage_1_lm)
## 
## Call:
## lm(formula = zmanagement ~ logcom_ttime + com_lage + com_lage_miss + 
##     as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + 
##     hos_numcompetitors + survey_reliability + survey_reliability_miss + 
##     grid_temp_new + grid_temp_new_miss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.67523 -0.42399 -0.02025  0.45768  1.94271 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -1.220290   0.380709  -3.205  0.00145 ** 
## logcom_ttime                     -0.313476   0.099266  -3.158  0.00170 ** 
## com_lage                         -0.007723   0.059524  -0.130  0.89683    
## com_lage_miss                    -0.825983   6.122994  -0.135  0.89275    
## as.factor(country)Canada          1.044096   0.247661   4.216 3.03e-05 ***
## as.factor(country)Sweden          0.943704   0.236457   3.991 7.71e-05 ***
## as.factor(country)United Kingdom  0.855469   0.190757   4.485 9.35e-06 ***
## as.factor(country)United States   1.413720   0.144762   9.766  < 2e-16 ***
## hos_lbed                         -0.005605   0.007157  -0.783  0.43398    
## hos_fprofit                       0.298490   0.158387   1.885  0.06015 .  
## hos_nfprofit                      0.208298   0.097679   2.132  0.03352 *  
## hos_numcompetitors                0.063952   0.065307   0.979  0.32800    
## survey_reliability                0.117466   0.018549   6.333 5.99e-10 ***
## survey_reliability_miss          12.594389   1.996967   6.307 6.99e-10 ***
## grid_temp_new                    -0.001162   0.010948  -0.106  0.91549    
## grid_temp_new_miss               -0.317342   1.191559  -0.266  0.79011    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7333 on 437 degrees of freedom
## Multiple R-squared:  0.3997, Adjusted R-squared:  0.3791 
## F-statistic:  19.4 on 15 and 437 DF,  p-value: < 2.2e-16

Creating Appendix C: Table 4

We can create the second stage by predicting our estimated z_management_hat, and using that instead of zmanagement in the model.

#predict zmanagement_hat for our data using the first stage
zmanagement_hat <- predict(stage_1_lm)
iv_stage_2_lm <- lm(zami_rate ~ zmanagement_hat + as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss)
summary(iv_stage_2_lm)
## 
## Call:
## lm(formula = zami_rate ~ zmanagement_hat + as.factor(country) + 
##     hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + 
##     survey_reliability + survey_reliability_miss + grid_temp_new + 
##     grid_temp_new_miss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1208 -0.6572 -0.1018  0.5040  4.5676 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                      -1.275233   0.768468  -1.659   0.0977 .
## zmanagement_hat                  -1.043021   0.434819  -2.399   0.0169 *
## as.factor(country)Canada          0.885774   0.537968   1.647   0.1004  
## as.factor(country)Sweden          0.873678   0.515995   1.693   0.0911 .
## as.factor(country)United Kingdom  0.946296   0.475245   1.991   0.0471 *
## as.factor(country)United States   1.524804   0.663373   2.299   0.0220 *
## hos_lbed                          0.005460   0.009933   0.550   0.5828  
## hos_fprofit                       0.265066   0.264416   1.002   0.3167  
## hos_nfprofit                     -0.075137   0.164739  -0.456   0.6485  
## hos_numcompetitors               -0.017080   0.100553  -0.170   0.8652  
## survey_reliability                0.104658   0.056938   1.838   0.0667 .
## survey_reliability_miss          10.998751   6.108416   1.801   0.0725 .
## grid_temp_new                     0.007524   0.015033   0.500   0.6170  
## grid_temp_new_miss                0.330931   1.643613   0.201   0.8405  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.008 on 439 degrees of freedom
## Multiple R-squared:  0.04157,    Adjusted R-squared:  0.01319 
## F-statistic: 1.465 on 13 and 439 DF,  p-value: 0.1271

We also want to check to see if we did the above IV regression correctly by using the ivreg library to check to see if they give the same results. They match coefficients, but have different p-values. We trust the ivreg library more since it is intended to do this instrumental variable regression, so we created Appendix C: Table 4 using this summary.

reg_iv <- ivreg(zami_rate ~ zmanagement + as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss
                | logcom_ttime + com_lage + com_lage_miss + as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss,
                data=data)
summary(reg_iv)
## 
## Call:
## ivreg(formula = zami_rate ~ zmanagement + as.factor(country) + 
##     hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + 
##     survey_reliability + survey_reliability_miss + grid_temp_new + 
##     grid_temp_new_miss | logcom_ttime + com_lage + com_lage_miss + 
##     as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + 
##     hos_numcompetitors + survey_reliability + survey_reliability_miss + 
##     grid_temp_new + grid_temp_new_miss, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.34884 -0.77235 -0.05527  0.70311  5.04501 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                      -1.275233   0.907345  -1.405   0.1606  
## zmanagement                      -1.043021   0.513398  -2.032   0.0428 *
## as.factor(country)Canada          0.885774   0.635189   1.395   0.1639  
## as.factor(country)Sweden          0.873678   0.609245   1.434   0.1523  
## as.factor(country)United Kingdom  0.946296   0.561131   1.686   0.0924 .
## as.factor(country)United States   1.524804   0.783257   1.947   0.0522 .
## hos_lbed                          0.005460   0.011728   0.466   0.6418  
## hos_fprofit                       0.265066   0.312200   0.849   0.3963  
## hos_nfprofit                     -0.075137   0.194510  -0.386   0.6995  
## hos_numcompetitors               -0.017080   0.118724  -0.144   0.8857  
## survey_reliability                0.104658   0.067228   1.557   0.1202  
## survey_reliability_miss          10.998751   7.212318   1.525   0.1280  
## grid_temp_new                     0.007524   0.017750   0.424   0.6719  
## grid_temp_new_miss                0.330931   1.940643   0.171   0.8647  
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value  
## Weak instruments   3 437     3.333  0.0194 *
## Wu-Hausman         1 438     4.055  0.0447 *
## Sargan             2  NA     4.685  0.0961 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.19 on 439 degrees of freedom
## Multiple R-Squared: -0.3361, Adjusted R-squared: -0.3757 
## Wald test: 1.051 on 13 and 439 DF,  p-value: 0.4013

Fixed-Effect Modeling

As referenced in the model, a lot of the motivation behind using country fixed effects is explained in the EDA section of this file. Here, we will focus on the modeling.

Creating Appendix C: Table 2, this is the final model that was chosen to estimate the effect of management quality on heart attack mortality rates by using fixed-effects.

summary(plm(zami_rate ~ zmanagement + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss, model="within", index="country", data=data))
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = zami_rate ~ zmanagement + hos_lbed + hos_fprofit + 
##     hos_nfprofit + hos_numcompetitors + survey_reliability + 
##     survey_reliability_miss + grid_temp_new + grid_temp_new_miss, 
##     data = data, model = "within", index = "country")
## 
## Unbalanced Panel: n = 5, T = 24-161, N = 453
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -2.39234 -0.64847 -0.10465  0.53967  4.70908 
## 
## Coefficients:
##                           Estimate Std. Error t-value Pr(>|t|)   
## zmanagement             -0.1823892  0.0648790 -2.8112 0.005156 **
## hos_lbed                 0.0086901  0.0097777  0.8888 0.374613   
## hos_fprofit             -0.0350270  0.2172882 -0.1612 0.872009   
## hos_nfprofit            -0.2664809  0.1338493 -1.9909 0.047111 * 
## hos_numcompetitors      -0.1194097  0.0863806 -1.3824 0.167562   
## survey_reliability       0.0038517  0.0265082  0.1453 0.884538   
## survey_reliability_miss  0.1929170  2.8521172  0.0676 0.946103   
## grid_temp_new            0.0094896  0.0149646  0.6341 0.526324   
## grid_temp_new_miss       0.6965218  1.6295025  0.4274 0.669265   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    464.8
## Residual Sum of Squares: 444.12
## R-Squared:      0.044487
## Adj. R-Squared: 0.016191
## F-statistic: 2.27098 on 9 and 439 DF, p-value: 0.017093

As mentioned in the model, we will include yy06 and then yy09 to illustrate the decreased certainty from the model when adding in these controls.

yy06

summary(plm(zami_rate ~ zmanagement + yy06 + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss, model="within", index="country", data=data))
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = zami_rate ~ zmanagement + yy06 + hos_lbed + hos_fprofit + 
##     hos_nfprofit + hos_numcompetitors + survey_reliability + 
##     survey_reliability_miss + grid_temp_new + grid_temp_new_miss, 
##     data = data, model = "within", index = "country")
## 
## Unbalanced Panel: n = 5, T = 24-161, N = 453
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -2.44562 -0.64589 -0.10919  0.54880  4.70558 
## 
## Coefficients:
##                           Estimate Std. Error t-value Pr(>|t|)   
## zmanagement             -0.1796177  0.0650326 -2.7620 0.005987 **
## yy06                     0.1331070  0.1872783  0.7107 0.477621   
## hos_lbed                 0.0086568  0.0097833  0.8849 0.376720   
## hos_fprofit             -0.0363793  0.2174191 -0.1673 0.867193   
## hos_nfprofit            -0.2672596  0.1339293 -1.9955 0.046604 * 
## hos_numcompetitors      -0.1194632  0.0864294 -1.3822 0.167613   
## survey_reliability       0.0048565  0.0265608  0.1828 0.855004   
## survey_reliability_miss  0.2672526  2.8556419  0.0936 0.925480   
## grid_temp_new            0.0094309  0.0149733  0.6298 0.529123   
## grid_temp_new_miss       0.6908478  1.6304412  0.4237 0.671979   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    464.8
## Residual Sum of Squares: 443.61
## R-Squared:      0.045587
## Adj. R-Squared: 0.015081
## F-statistic: 2.0921 on 10 and 438 DF, p-value: 0.023888

yy06

summary(plm(zami_rate ~ zmanagement + yy09 + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss, model="within", index="country", data=data))
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = zami_rate ~ zmanagement + yy09 + hos_lbed + hos_fprofit + 
##     hos_nfprofit + hos_numcompetitors + survey_reliability + 
##     survey_reliability_miss + grid_temp_new + grid_temp_new_miss, 
##     data = data, model = "within", index = "country")
## 
## Unbalanced Panel: n = 5, T = 24-161, N = 453
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -2.44562 -0.64589 -0.10919  0.54880  4.70558 
## 
## Coefficients:
##                           Estimate Std. Error t-value Pr(>|t|)   
## zmanagement             -0.1796177  0.0650326 -2.7620 0.005987 **
## yy09                    -0.1331070  0.1872783 -0.7107 0.477621   
## hos_lbed                 0.0086568  0.0097833  0.8849 0.376720   
## hos_fprofit             -0.0363793  0.2174191 -0.1673 0.867193   
## hos_nfprofit            -0.2672596  0.1339293 -1.9955 0.046604 * 
## hos_numcompetitors      -0.1194632  0.0864294 -1.3822 0.167613   
## survey_reliability       0.0048565  0.0265608  0.1828 0.855004   
## survey_reliability_miss  0.2672526  2.8556419  0.0936 0.925480   
## grid_temp_new            0.0094309  0.0149733  0.6298 0.529123   
## grid_temp_new_miss       0.6908478  1.6304412  0.4237 0.671979   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    464.8
## Residual Sum of Squares: 443.61
## R-Squared:      0.045587
## Adj. R-Squared: 0.015081
## F-statistic: 2.0921 on 10 and 438 DF, p-value: 0.023888

Alternatively, we will compare the exact same controls as our final FE model but instead use the region_survey fixed-effects and display the reduced certainty it brings into our model.

summary(plm(zami_rate ~ zmanagement + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss, model="within", index="region_survey", data=data))
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = zami_rate ~ zmanagement + hos_lbed + hos_fprofit + 
##     hos_nfprofit + hos_numcompetitors + survey_reliability + 
##     survey_reliability_miss + grid_temp_new + grid_temp_new_miss, 
##     data = data, model = "within", index = "region_survey")
## 
## Unbalanced Panel: n = 75, T = 1-39, N = 453
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -2.5933593 -0.5940206 -0.0026787  0.5082217  3.1061505 
## 
## Coefficients:
##                           Estimate Std. Error t-value Pr(>|t|)  
## zmanagement             -0.1720557  0.0691502 -2.4881  0.01328 *
## hos_lbed                 0.0030032  0.0115874  0.2592  0.79564  
## hos_fprofit              0.0699421  0.2527916  0.2767  0.78218  
## hos_nfprofit            -0.1236863  0.1568130 -0.7888  0.43076  
## hos_numcompetitors      -0.1487774  0.0980047 -1.5181  0.12985  
## survey_reliability       0.0083172  0.0285024  0.2918  0.77060  
## survey_reliability_miss  0.8708110  3.0858147  0.2822  0.77795  
## grid_temp_new           -0.0076748  0.0294098 -0.2610  0.79427  
## grid_temp_new_miss      -1.1438812  3.1884373 -0.3588  0.71998  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    374.87
## Residual Sum of Squares: 362.93
## R-Squared:      0.031843
## Adj. R-Squared: -0.18593
## F-statistic: 1.34848 on 9 and 369 DF, p-value: 0.2103

Conclusion

Through this script, we hope you are able to gain more insight into our modeling process, the technicality of the work we were able to complete, and the cautionary steps that we took in order to try our best and accurately model the causal relationship that we are interested in.